Knowledge-engineered Terminology (data)bases

نویسندگان

Lee Gillam

Khurshid Ahmad

چکیده

Issues related to the representation of data asssociated with terms are considered with specific reference to two knowledge representation formalisms, frames and semantic networks. Introduction Techniques developed by researchers and professional software engineers for building knowledge-based expert systems, natural language processing systems, image understanding systems and planning systems have shown considerable promise in representing and applying knowledge electronically. Such representation of knowledge in, and its application by, a computer system helps in sharing the burden of interpretating a term; it is the interpretation that determines the ultimate use of a term. The electronic representation and application of knowledge is of substantial import to the ever-burgeoning terminology databases. Typically, the representation is effected through propositional logic or the so-called prototypical frames, and application effected through logical inference. This does not mean that all knowledge can be represented using predicate logic or that logical inferences are always the correct inferences. The best a logic-based or frame-based system can offer is help in indexing individual items of data (or knowledge) in an automatic fashion without the need for human interference. One important feature of these knowledge representation schemes is that they help in economising information: specifically, information about a superordinate, for instance, is recorded once and then this information is automatically transmitted to all the sub-ordinates and instances; all that is required here is the indication that there exists a hyponymous relationship between two items of information. Experienced terminologists may recall many laboriously drawn conceptual schemes that were ideal for discussing the structure and content of terminology data bases with others. Apart from a complex index entry, following in the ways of information-sciences classification systems (Dewey’s etc.), in a data base, the conceptual schemes could neither be implemented, enforced or used by others. The automatic indexing, which is linked to property inheritance, and the ways and means in which knowledge is structured through the use of logic-based and frame-based systems, helps in the representation of knowledge, rather than its encryption, like in Morse-code type systems or data compression systems, or its encoding, as in currently available relational data bases. A good example of this electronic automata that can deal with knowledge is a knowledge-based expert system. Typically, such a system comprises a large number of knowledge fragments including heuristics and wellelaborated domain specific terms that have been electronically represented and can be electronically applied. The literature on expert systems claims that knowledge, in howsoever a restricted sense of the term, has somehow or the other been 'engineered' on a computer system: starting from the raw material, the knowledge fragments obtained from experts, to its more symbolic form, the electronic representation, and on to its electronic application. Hence, the term knowledge engineering is used to refer to the methods, tools and techniques of rendering knowledge and its interpretation on and through a machine the computer system. We believe that the data in a conventional large TDB (c. 10,000 terms) can only be effectively used, at least for some users, if the knowledge implicit in the data related to individual terms, or the knowledge implicit in the associated conceptual schemes, is made explicit through the use of, say, semantic networks or prototypical frames. Indeed, such a representation will enable knowledge engineers and information scientists to adopt and to use terminology data bases. Consider, for example, the development of the Unified Medical Language Systems (UMLS) by the US National Institute of Health for the purposes of cataloguing and interpreting papers published by biomedical researchers. UMLS can access a contemporary conceptual 'map' of medicine (190,863 concepts in all), a terminology database (371,742 terms in total), together with a retrieval program that uses hypertext text technology and a knowledge representation formalism the so-called semantic networks. A typical user of UMLS, usually a researcher looking for a learned paper, can navigate through the terminology databases in a manner which may appear novel to many terminology database users: for instance, starting from a disease family, the user can look at a specific disease within the family; into drugs that can cure the disease; to the names of people and their publications related to the disease. Such an operation, covering many, many terms, would require a considerable effort on the part of a typical TDB user, but UMLS, through its semantic network that has pre-recorded the links (relations) between the various terms (the nodes in a semantic network), coupled with the explanatory hypertext-linked text excerpts, helps its user to 'double-click' through a network that may comprise literally thousands of terms. In this paper we will present a knowledge-engineered terminology database (KETDB) which has been developed to store and retrieve terms related to (human) visual neurons. After discussing the strengths and weaknesses of the UMLS approach, we show how a KETDB can be created using a low-cost knowledge engineering software system Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)sponsored Babylon System. The use of semantic networks with particular reference to John Sowa’s Conceptual Graphs is discussed and the approaches are compared and contrasted. UMLS The programs and databases comprising UMLS help to provide consistent categorisation of medical concepts and enable retrieval and integration of information from multiple machine-readable biomedical information sources such as US National Institute of Health’s Medical Literature Analysis and Retrieval Systems (MEDLARS). UMLS can access a Metathesaurus a database of information on concepts that appear in one or more biomedical sources, like MEDLINE (the on-line version of MEDLARS) and the Quick Medical Reference (QMR). These concepts have been hierarchically organised in a network of some 133 semantic types with major groupings for organisms, anatomical 1 figures relate to the 1994 version of UMLS Metathesaurus structures, biological information, chemicals, events, and so on. 49 relations which are grouped into 5 major categories are also included. A typical ‘user’ of UMLS can, via either a PC based Browser or a Macintosh based Hypertext system, both of which make use of the Metathesaurus, explore the semantic relations between the constituent terms. Hypertext Browsing: Consider, for example, the term ‘Interneuron’, a type of neuron. Via a Macintosh Hypercard based implementation of UMLS, the user can find the part of speech (SP), semantic type (Typ), a definition of the concept ‘Interneuron’, a set of references for sources where the concept appears (Src) and the context hierarchy of the concept in the source selected from the Src field. Figure 1: Browsing UMLS via Hypercard Database browsing: Entering the query term ‘Retinal neurons’ into the DOSbased (Coach) Metathesaurus Browser (CMB) causes the Metathesaurus to be searched for concepts related to this query. These related concepts are presented to the user in ranked matching order, placed in the ‘Main Concepts’ portion of the screen. From this, the user can select a particular concept and ‘Show’ items related to that concept such as: child terms; sibling terms; synonyms, variants and related terms. When an item from this menu has been selected, the ‘Concept Definition’ is revealed. Figure 2 below shows the child terms and the Concept Definition for the selected concept ‘Retina’. The emphasis in conventional terminology literature is on hyponomies: the relationship between super-ordinate and sub-ordinate terms. In a conventional termbank however, the human browser has to make the judgement regarding the parent or superordinate and the child or sub-ordinate term by scanning record formats of individual terms and finding these implicitly defined relationships. It is possible that the human browser may be familiar with the parent, but not the child. For instance, the human browser is more than likely aware of the term ‘Retina’, but less likely to be knowledgeable about the child terms e.g. Fundus Oculi or Macula Lutea (cf. Figure 2). Figure 2: Browsing UMLS using the Coach Metathesaurus Browser The UMLS literature refers to the so-called semantic networks. Unfortunately, this term has a wide-spread yet non-standardised usage: for some a semantic network is any network where nodes and links can be organised by a computer system with references to epistemological primitives (e.g. Quillian 1966). The representational and inferential power of this kind of semantic network has been challenged by, among others, Brachman (1979) and indeed the author has attempted to formalise the whole notion of a semantic network. An earlier attempt at grounding this notion of networks can be traced back to Marvin Minsky’s frames of reference: using the metaphor of a single frame of film, Minsky argues that knowledge can be encoded in packets, and the frames embedded in a retrieval network such that if one frame was accessed, indexes to other potentially relevant frames would become available (see Maida, 1994, for a survey of frames). These systems are established in that they are, to use Jackson’s terminology, logically and heuristically better grounded than ordinary semantic networks. It is the emphasis on encoding and indexing knowledge (fragments) in frames, together with its philosophical underpinnings expressed in terms like frames of reference, that has motivated us to use frames for representing and retrieving terms. ‘Representing’ neurons using semantic networks A ‘neuron’, synonymously known as a ‘nerve cell’, typically has two appendages, arboreal dendrites and a tubular axon. Dendrites can be apical or basal or potentially both and the axon myelinated or unmyelinated with a diameter between 0.2μm and 20μm. There are, however, neurons without dendrites and neurons without axons. Indeed, there are neurons where a single appendage acts as both axon and dendrite. Neurons have been classified by their various features like polarity, function and energy releas mechanism. One of the key neuronal groupings within the visual system are the photoreceptors which are located specifically in the retina. They conduct by membrane potential and convert light into electrical signals. An oft-encountered taxonomy of neurons that includes visual neurons, interneurons can be viewed as a link-and-node cluster, a typical semantic network in effect, where the link is an is-a link and the nodes are essentially terms (see Figure 3). Figure 3: A simple taxanomic hierarchy What is important here is that one can draw inferences about the various specialised neurons or indeed about the specialised cell, the neuron. The principal inference is that whatever is true of a cell is true of a neuron, and whatever is true of a neuron is true of an interneuron, a photoreceptor and a ganglion cell; the is-a link helps in the transmission of properties down the heirarchy. Each item of information is represented only once, leading to a considerable saving in memory, and perhaps more importantly, in managing changes to a knowledge base. But this inheritance is not well controlled and somtimes leads to the kinds of inference which lead to the conclusion that since ‘a penguin is a bird, and all birds fly, then a penguin can fly too’. There is a need to control the inheritance of properties, allow for exceptions and, in effect, formalise the notion of a semantic network. Such a move has led to two developments: frames and a host of formalisations of the semantic network concept. Brachman and Schmolze’s (1985) KL-ONE is an example of the latter. In KL-ONE concepts are treated as objects with important internal structure. Elements of the structure are represented by nodes or ‘roles’. It is these roles along with role names, constraints and the relationships among the roles that defines the essence of the concept. Roles are similar to the slots of frames, however, they can, in addition to being inherited, be differentiated into subroles which can be further constrained and placed in certain relationships. Figure 4 below shows the concept of a retina in KL-ONE. The RoleD links are role definitions and the V/R (value restrictions) links place value restrictions on the roles. The relationships required between the various roles would be described in the ‘Structural Description’. One of our value restrictions, for example, might be that of connectivity with other neurons in that ‘rods’ or ‘cones’ only connect to certain types of cell, or that they only operate during light/dark conditions. is-a

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling knowledge bases by reusing generic ontologies and unified terminology servers

a gModelling knowledge bases by reusing pre-existing knowledge is not a straightforward process. In most cases, the current ontologies contain too general constructs, so the reusing process must try to bridge the gap between domain applications and general ontologies. A solution is to extract specific concepts from some unified terminology server and to combine them with generic concepts from s...

متن کامل

Measuring the Total Knowledge of English Sports Terminology by the Physical Education Students at Saudi Universities

Background. Given the importance of English a global language and one of the four official languages used in FIFA, it is essential to acquaint students who are specialised in Physical Education with English sport terminology. Objectives. This study aims to determine the knowledge level of physical education students in Saudi universities’ faculties and departments in relation to English sports...

متن کامل

Lexicographyfor Specialised Languages - Terminology and Terminography Retrieving and Codifying Lexical Information in Process Oriented Terminology Management

The emergence of new information media has had an impact on the working methods of lexicography and terminology as well as in the products obtained. Among the new media, knowledge bases are a valuable source that allows for information to be tailored to the needs of different users. We present several ways of codifying lexical and phraseological information in order to build a knowledge base on...

متن کامل

Concept Analysis And Terminology: A Knowledge-Based Approach To Documentation

The central concern of terminology, a component of the general documentation process, is concept analysis, an activity which is becoming recognized as fundamental as term banks evolve into knowledge bases. We propose that concept analysis can be facilitated by knowledge engineering technology, and describe a generic knowledge acquisition tool called CODE (Conceptually Oriented Design Environmen...

متن کامل

A virtual medical record for guideline-based decision support

A major obstacle in deploying computer-based clinical guidelines at the point of care is the variability of electronic medical records and the consequent need to adapt guideline modeling languages, guideline knowledge bases, and execution engines to idiosyncratic data models in the deployment environment. This paper reports an approach, developed jointly by researchers at Newcastle and Stanford...

متن کامل

Knowledge Management for Terminology-Intensive Applications: Needs and Tools

This paper addresses the problem of how to provide support for the acquisition, formalization, refinement, retrieval in other words, for the management of the knowledge required for producing high-quality terminology. This problem will become increasingly significant as term banks evolve into knowledge bases. Knowledge management for terminology-intensive activities is complicated by two factor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Knowledge-engineered Terminology (data)bases

نویسندگان

چکیده

منابع مشابه

Modelling knowledge bases by reusing generic ontologies and unified terminology servers

Measuring the Total Knowledge of English Sports Terminology by the Physical Education Students at Saudi Universities

Lexicographyfor Specialised Languages - Terminology and Terminography Retrieving and Codifying Lexical Information in Process Oriented Terminology Management

Concept Analysis And Terminology: A Knowledge-Based Approach To Documentation

A virtual medical record for guideline-based decision support

Knowledge Management for Terminology-Intensive Applications: Needs and Tools

عنوان ژورنال:

اشتراک گذاری